Are Very Large Context-Free Grammars Tractable?
نویسندگان
چکیده
In this paper, we present a method which, in practice, allows to use parsers for languages defined by very large context-free grammars (over a million symbol occurrences). The idea is to split the parsing process in two passes. A first pass computes a sub-grammar which is a specialized part of the large grammar selected by the input text and various filtering strategies. The second pass is a traditional parser which works with the subgrammar and the input text. This approach is validated by practical experiments performed on a Earley-like parser running on a test set with two large context-free grammars.
منابع مشابه
Studying impressive parameters on the performance of Persian probabilistic context free grammar parser
In linguistics, a tree bank is a parsed text corpus that annotates syntactic or semantic sentence structure. The exploitation of tree bank data has been important ever since the first large-scale tree bank, The Penn Treebank, was published. However, although originating in computational linguistics, the value of tree bank is becoming more widely appreciated in linguistics research as a whole. F...
متن کاملThe Theory of Grammar Constraints
By introducing the Regular Membership Constraint, Gilles Pesant pioneered the idea of basing constraints on formal languages. The paper presented here is highly motivated by this work, taking the obvious next step, namely to investigate constraints based on grammars higher up in the Chomsky hierarchy. We devise an arc-consistency algorithm for context-free grammars, investigate when logic combi...
متن کاملGeneralized Probabilistic LR Parsing of Natural Language (Corpora) with Unification-Based Grammars
We describe work toward the construction of a very wide-coverage probabilistic parsing system for natural language (NL), based on LR parsing techniques. The system is intended to rank the large number of syntactic analyses produced by NL grammars according to the frequency of occurrence of the individual rules deployed in each analysis. We discuss a fully automatic procedure for constructing an...
متن کاملHighly Constrained Unification Grammars
Unification grammars are widely accepted as an expressive means for describing the structure of natural languages. In general, the recognition problem is undecidable for unification grammars. Even with restricted variants of the formalism, off-line parsable grammars, the problem is computationally hard. We present two natural constraints on unification grammars which limit their expressivity an...
متن کاملExploring Context-Sensitivity in Spatial Intention Recognition
In its most general form, the problem of inferring the intentions of a mobile user from his or her spatial behavior is equivalent to the plan recognition problem which is known to be intractable. Tractable special cases of the problem are therefore of great practical interest. Using formal grammars, intention recognition problems can be stated as parsing problems in a way that makes the connect...
متن کامل